Bandit Problems with Lévy Payoff Processes

نویسندگان

ASAF COHEN

EILON SOLAN

چکیده

We study one-arm Lévy bandits in continuous time, which have one safe arm that yields a constant payoff s, and one risky arm that can be either of type High or Low; both types yield stochastic payoffs generated by a Lévy process. The expectation of the Lévy process when the arm is High is greater than s, and lower than s if the arm is Low. The decision maker (DM) has to choose, at any given time t, the fraction of resource over the time interval [t, t+dt) to be allocated to each arm. We show that under proper conditions on the Lévy processes, there is a unique optimal strategy, which is a cut-off strategy, and we provide an explicit formula for the cut-off and the corresponding expected payoff from the data of the problem. We also examine the case where the DM has incorrect prior over the type of the risky arm, and we calculate the expected payoff gained by a DM who plays the optimal strategy that corresponds to the incorrect prior. In addition, we study some applications of the results: (a) we show how to price information in one arm Lévy bandit problem, and (b) we investigate who fares better in one-arm bandit problems: an optimist who assigns a probability higher than the true probability to High, or a pessimist who assigns a probability lower than the true probability to High.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ar X iv : 0 90 6 . 08 35 v 1 [ m at h . PR ] 4 J un 2 00 9 BANDIT PROBLEMS WITH LÉVY PAYOFF PROCESSES

Abstract. We study two-armed Lévy bandits in continuous-time, which have one safe arm that yields a constant payoff s, and one risky arm that can be either of type High or Low; both types yield stochastic payoffs generated by a Lévy process. The expectation of the Lévy process when the arm is High is greater than s, and lower than s if the arm is Low. The decision maker (DM) has to choose, at a...

متن کامل

Reduced Variance Payoff Estimation in Adversarial Bandit Problems

A natural way to compare learning methods in nonstationary environments is to compare their regret. In this paper we consider the regret of algorithms in adversarial multi-armed bandit problems. We propose several methods to improve the performance of the baseline exponentially weighted average forecaster by changing the payoff-estimation methods. We argue that improved performance can be achie...

متن کامل

Showing Relevant Ads via Context Multi-Armed Bandits

We study context multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a context multi-armed bandit problem models a situation where, in a sequence of independent trials, an online algorithm chooses an action based on a given context (side information) from a set of possible actions so as to...

متن کامل

ar X iv : 0 80 9 . 48 82 v 1 [ cs . D S ] 2 9 Se p 20 08 Multi - Armed Bandits in Metric Spaces ∗

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of n trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applicatio...

متن کامل

The Nonstochastic Multiarmed

In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Bandit Problems with Lévy Payoff Processes

نویسندگان

چکیده

منابع مشابه

ar X iv : 0 90 6 . 08 35 v 1 [ m at h . PR ] 4 J un 2 00 9 BANDIT PROBLEMS WITH LÉVY PAYOFF PROCESSES

Reduced Variance Payoff Estimation in Adversarial Bandit Problems

Showing Relevant Ads via Context Multi-Armed Bandits

ar X iv : 0 80 9 . 48 82 v 1 [ cs . D S ] 2 9 Se p 20 08 Multi - Armed Bandits in Metric Spaces ∗

The Nonstochastic Multiarmed

عنوان ژورنال:

اشتراک گذاری